Learning binaural spectrogram features for azimuthal speaker localization

نویسنده

  • Wiktor Mlynarski
چکیده

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...

متن کامل

Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation

To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial ...

متن کامل

Unsupervised feature learning on monaural DOA estimation using convolutional deep belief networks

In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. Additionally, in the field of sound direction-of-arrival (DOA) estimation, the binaural features like interaural time or phase difference and interaural level difference, or monaural cues like spectral peaks and notches are often used to estimate soun...

متن کامل

Localization dominance in the median-sagittal plane: effect of stimulus duration.

Localization dominance is an aspect of the precedence effect (PE) in which the leading source dominates the perceived location of a simulated echo (lagging source). It is known to be robust in the horizontal/azimuthal dimension, where binaural cues dominate localization. However, little is known about localization dominance in conditions that minimize binaural cues, and most models of precedenc...

متن کامل

Testing the Use of the Binaural Cross-Correlation Coeffiecnt in Azimuthal Sound Localization

Azimuthal sound localization studies were performed on 4 listeners for sounds recorded in two different rooms. The sounds had multiple values of binaural coherence ranging from 0.2 to 0.8 in each room. The sounds were presented to the listener, followed by the same sound with a slight delay in either ear to create an interaural time difference. Previous studies performed with synthetically corr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013